Goto

Collaborating Authors

 proximal update




10 Appendix 10.1 Pseudo-code for DQN Pro Below, we present the pseudo-code for DQN Pro. Notice that the difference between DQN and DQN

Neural Information Processing Systems

Below, we present the pseudo-code for DQN Pro. Pro is minimal (highlighted in gray). Sticky actions True Optimizer Adam Kingma & Ba (2015) Network architecture Nature DQN network Mnih et al. (2015) Random seeds { 0, 1, 2, 3, 4 } Rainbow hyper-parameters (shared) Batch size 64 Other Config file rainbow_aaai.gin Theorem 2. Consider the PMPI algorithm specified by: We make two assumptions: 1. we assume error in policy evaluation step, as already stated in equation (4). All results are averaged over 5 independent seeds.


Faster Deep Reinforcement Learning with Slower Online Network

Neural Information Processing Systems

Deep reinforcement learning algorithms often use two networks for value function optimization: an online network, and a target network that tracks the online network with some delay. Using two separate networks enables the agent to hedge against issues that arise when performing bootstrapping.







Reviews: Multi-view Matrix Factorization for Linear Dynamical System Estimation

Neural Information Processing Systems

This paper proposes an efficient maximum likelihood algorithm for parameter estimation in linear dynamical systems. The problem is reformulated as a two-view generative model with a shared latent factor, and approximated as a matrix factorization problem. The paper then proposes a novel proximal update. Experiments validate the effectiveness of the proposed method. The paper realizes that maximum likelihood style algorithms have some merit over classical moment-matching algorithms in LDS, and wants to solve the efficiency problem of existing maximum likelihood algorithms. Then the paper proposes a theoretical guaranteed proximal update to solve the optimization problem.